Skip to content

Conversation

@idegtiarenko
Copy link
Contributor

@idegtiarenko idegtiarenko commented May 22, 2025

Today the output of attribute list is not limited in any way.

This results in a following tree when running wide schema benchmark with trace logging enabled:

Logs
[2025-05-07T22:12:02,802][DEBUG][o.e.x.e.s.EsqlSession    ] [runTask-0] Optimized physical plan:
LimitExec[1[INTEGER]]
\_ExchangeExec[[@timestamp{f}#2227832, field_0_0{f}#2228436, field_0_1{f}#2228433, field_0_10{f}#2227471, field_0_100{f}#2227691, field_0_101{f}#2227694, field_0_102{f}#2227692, field_0_103{f}#2227754, field_0_104{f}#2227752, field_0_105{f}#2227758, field_0_106{f}#2227756, field_0_107{f}#2227762, field_0_108{f}#2227760, field_0_109{f}#2227765, field_0_11{f}#2227478, field_0_110{f}#2227726, field_0_111{f}#2227725, field_0_112{f}#2227730, field_0_113{f}#2227728, field_0_114{f}#2227030, field_0_115{f}#2227029, field_0_116{f}#2227032, field_0_117{f}#2227031, field_0_118{f}#2227034, field_0_119{f}#2227033, ..., field_4_84{f}#2227171, field_4_85{f}#2227172, field_4_86{f}#2227173, field_4_87{f}#2227174, field_4_88{f}#2227179, field_4_89{f}#2227180, field_4_9{f}#2228474, field_4_90{f}#2227181, field_4_91{f}#2227186, field_4_92{f}#2227187, field_4_93{f}#2227188, field_4_94{f}#2227189, field_4_95{f}#2227182, field_4_96{f}#2227183, field_4_97{f}#2227184, field_4_98{f}#2227185, field_4_99{f}#2227190, id{f}#2229085, key_1000{f}#2227644, key_100000{f}#2228620, key_1000000{f}#2229128, key_100000000{f}#2228453, key_200000{f}#2227833, key_500000{f}#2227563, key_5000000{f}#2228484, lookup_keyword_0{f}#2227620, lookup_keyword_0.keyword{f}#2227621, lookup_keyword_1{f}#2227617, lookup_keyword_1.keyword{f}#2227618, lookup_keyword_2{f}#2227614, lookup_keyword_2.keyword{f}#2227615, lookup_keyword_3{f}#2227611, lookup_keyword_3.keyword{f}#2227612, lookup_keyword_4{f}#2227608, lookup_keyword_4.keyword{f}#2227609, lookup_keyword_5{f}#2227605, lookup_keyword_5.keyword{f}#2227606, lookup_keyword_6{f}#2227602, lookup_keyword_6.keyword{f}#2227603, lookup_keyword_7{f}#2227599, lookup_keyword_7.keyword{f}#2227600, lookup_keyword_8{f}#2227596, lookup_keyword_8.keyword{f}#2227597, lookup_keyword_9{f}#2227593, lookup_keyword_9.keyword{f}#2227594],false]
\_FragmentExec[filter=null, estimatedRowSize=0, reducer=[], fragment=[<>
Project[[@timestamp{f}#2227832, field_0_0{f}#2228436, field_0_1{f}#2228433, field_0_10{f}#2227471, field_0_100{f}#2227691, field_0_101{f}#2227694, field_0_102{f}#2227692, field_0_103{f}#2227754, field_0_104{f}#2227752, field_0_105{f}#2227758, field_0_106{f}#2227756, field_0_107{f}#2227762, field_0_108{f}#2227760, field_0_109{f}#2227765, field_0_11{f}#2227478, field_0_110{f}#2227726, field_0_111{f}#2227725, field_0_112{f}#2227730, field_0_113{f}#2227728, field_0_114{f}#2227030, field_0_115{f}#2227029, field_0_116{f}#2227032, field_0_117{f}#2227031, field_0_118{f}#2227034, field_0_119{f}#2227033, field_0_12{f}#2227479, field_0_120{f}#2227014, ..., field_4_98{f}#2227185, field_4_99{f}#2227190, id{f}#2229085, key_1000{f}#2227644, key_100000{f}#2228620, key_1000000{f}#2229128, key_100000000{f}#2228453, key_200000{f}#2227833, key_500000{f}#2227563, key_5000000{f}#2228484, lookup_keyword_0{f}#2227620, lookup_keyword_0.keyword{f}#2227621, lookup_keyword_1{f}#2227617, lookup_keyword_1.keyword{f}#2227618, lookup_keyword_2{f}#2227614, lookup_keyword_2.keyword{f}#2227615, lookup_keyword_3{f}#2227611, lookup_keyword_3.keyword{f}#2227612, lookup_keyword_4{f}#2227608, lookup_keyword_4.keyword{f}#2227609, lookup_keyword_5{f}#2227605, lookup_keyword_5.keyword{f}#2227606, lookup_keyword_6{f}#2227602, lookup_keyword_6.keyword{f}#2227603, lookup_keyword_7{f}#2227599, lookup_keyword_7.keyword{f}#2227600, lookup_keyword_8{f}#2227596, lookup_keyword_8.keyword{f}#2227597, lookup_keyword_9{f}#2227593, lookup_keyword_9.keyword{f}#2227594]]
\_Limit[1[INTEGER],false]
\_EsRelation[idx_*][@timestamp{f}#2227832, field_0_0{f}#2228436, field_..]<>]]

I had to manually truncate the output as github was complaining that pr description is >= 65kb with 2500 attributes listed twice in the tree (in _ExchangeExec and in Project).

Ideally we should handle that while building generic property but for now this limits output of the ones I noticed.

@idegtiarenko idegtiarenko added >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL v9.1.0 labels May 22, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@shmuelhanoch
Copy link
Contributor

@idegtiarenko shouldn't TO_STRING_MAX_WIDTH / TO_STRING_MAX_PROS from Node solve exactly this problem? why we need an additional limitation logic?

And anyway, how hard would it be to just fix it in propertiesToString? Even as a temporary solution, it's a little strange to have multiple override implementations for nodes that uses different limitation logics.

@idegtiarenko
Copy link
Contributor Author

@idegtiarenko shouldn't TO_STRING_MAX_WIDTH / TO_STRING_MAX_PROS from Node solve exactly this problem? why we need an additional limitation logic?

It looks like that however it is actually not solving this issue (as you can see the plan still contains very long output).
If you look closely at

if (maxWidth + stringValue.length() > TO_STRING_MAX_WIDTH) {
int cutoff = Math.max(0, TO_STRING_MAX_WIDTH - maxWidth);
sb.append(stringValue.substring(0, cutoff));
sb.append("\n");
stringValue = stringValue.substring(cutoff);
maxWidth = 0;
}
maxWidth += stringValue.length();
sb.append(stringValue);

it only inserts \n once if the last stringValue makes the line longer than the limit.
It does not trim output nor break the line multiple times.

And anyway, how hard would it be to just fix it in propertiesToString? Even as a temporary solution, it's a little strange to have multiple override implementations for nodes that uses different limitation logics.

I would prefer to fix 2 affected places for now. Generic one could be fixed later that would also allow us remove all custom nodeString implementation. Probably it is worth doing it if we decide to implement Characterization test that depend on the tree output. It might also be quiet tricky if we decide to be a bit more careful with long string descriptions (eg, toString(prop) produces a long string that is later substring. Why not pass string builder inside and only append required values instead?)

@idegtiarenko idegtiarenko requested a review from shmuelhanoch May 26, 2025 08:14
@astefan astefan requested a review from costin May 26, 2025 09:15
@alex-spies
Copy link
Contributor

alex-spies commented Jun 2, 2025

It looks to me like Node#propertiesToString doesn't account for the fact that often, properties are themselves lists - so while it accounts for too many props, it doesn't account for a single prop being a humongous list.

Wouldn't it be nicer to detect this case inside Node#propertiesToString and truncate properties that are long lists, rather than special-casing just Project and ExchangeExec?

Edit:
More specifically, it looks like this part of Node#toString could be consolidated with the limitedToString method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants